DOCUMENT RESUME 



ED 079 402 



TM 003 G06 



•AUTHOR 
TITLE 

NOTE 

EDRS PRICE 
DESCRIPTORS 



Woodson, M« !• Charles E* ^ 
The Issue of item and Test Variance for 
Criterion-Referenced Tests • 
5p^ 

MF-$0*65 HC-$3*29 

♦Criterion Referenced Tests; *Eval'^atiGn Criteria; 
Item Analysis; *Item Sampling; Statistical Analysis; 
*Test Construction; Testing; Test Interpretation 



ABSTRACT 

It has been argued^ that item variance and test 
variance are not necessary characteristics for criterion-referenced 
tests, although they are necessary for norm- referenced tests* This 
position is in error because it considers sample statistics as the 
criteria for evaluating items and tests* Within a particular sample, 
an item or test may have no variance, but in the' population for which 
the test was designed and evaluated, botH items and tests must have 
variance* . (Author) 
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. It hcis been argued' that item variance and test variance are not 

r 

necessary characteristics for c- •<.terion--r'^fercncec tests j, altbougl. they 

ft 

are nece.ssar-y for norm-reicerenc<id tests o This position is in error 
because ic consiiSerB ssmple siiatistics the criteria for evaluating 
items ar.d tests. Within a particular sample, an item cr test aay have 
no variance j> but xvt the population for which the test was designed aiid 
evaluatcid^ both items and tests must have variance # 
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Popham and Husek (1969) have argued that the test items for 
cricerion-referenced tests and the tests themselves (Popham, personal 
corfuTtunication)- may have no variance and therefore traditional methods 
of er.pirical evaluation of test items and tests are invalid for 
criterion-referenced teats. Popham and Husek (1969) conclude, "With 
criterion-referenced tests, variability is irrelevant .... 
Variability is not a necessary condition for a good criterion-referenced 



test," 



Consider the example of the ideal outcome of a perfect instructional 
procedure; Before instruction everyone misses all items, after instruction 
everyone getB all items correct. This ideal outcome has been referred to 
as showing no item variance. 

The basic flaw in this argument is that it fails to consider the 
question of what generalizations are to be made from the observations. 
Popham and Husek were speaking about sample statistics, but statistics for 
evaluating items or test characteristics refer to the population of 
observations for which the instrument was designed and calibrated. The 
population of observations for which an item is calibrated must be the 
reference for the evaluation of an item. In classical teat theory (Lord 
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and Novick, 1968) for nona-referenced tests, Item analysis, test development 

and test validation must be done on observations of a sample representative 

of the population for which the test will be u^ed. For criterion-^referenced 

tests » item analysis and test development must be done on observations 

representative of the observations within the range of interest on the 

« 

characteristic of interesto Referring to the ideal outcoiaie above, the 
range of possible observations of interest includes the observation of 
passing no items as well as that of passing all items. 

Statistics from a particular sample of observations, perhaps restricted 
in aoma way, do not give us a definitive answer about what ,^aiscrimination 
in the. population of observations may be. If the sample is considerably 
restricted, the estimates of test parameters will be influenced. Iteuis 
may have no variance in a restricted sample, e.g., individuals who have 
finished an instructional program, and yet be useful items because they 
do have variance •within the population for which they are calibrated. 
The extreme case in which an item is^missed by all subjects on" t^^ pre-test 
and answered correctly by all subjects on the post-tesv., in fact, is an 
example of the maximum variance for an item within the sample of observations 
collected. In this case we have data from what appears to be two extreme 
•points on the characteristic of interest. 

In classical theory, item analysis seeks to answer the question^ 
"Does the item discriminate on the characteristic measured within the 
distribution of scores, for the population of interest." The reference 
here is to differences among persons of a population. Item analysis in 
this case requires observations on a sample representative of this population* 



Criterion-referenced item analysis seeks an answer to the question, 
"Doe« the item discrimiiiate within the range of int(^rest on the character- 
istic measured?" The reference here io to different ob£;ervationo on the 
characteristic • Item analysis in this case requires observations at 
different points on the characteristic # 

In either case, item variance and discrimination are essential. In 
short, (i) items and tests must be evaluated for the range pi the 
characteristic for which they will be used, and (2) items .and tests which 
give no variability in this population of observations, give no information 
and are therefore not useful. 
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